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IN THE CLAIMS; 

1 . (currently amended) A speech recognition system comprising! 

an acoustic detector for detecting speech utterances of a speaker using an audio 
input device; 

a visual detector for detecting at least one facial characteristic associated with 
speech utterances of the speaker; 

a processing arrangement connected to be responsive to the acoustic and visual 
detectors for deriving a signal having first and second values respectively indicative of 
the speaker making and not making speech utterances such that the first value is derived 
in response to the acoustic detector detecting a finite, nonzero acoustic response while the 
visual detector detects at least one facial characteristic associated with speech utterance 
of the speaker, said processing arrangement comprising a circular buffer for continuously 
receiving and maintaining a most recent time period last few seconds of the acoustic 
response supplied to said audio input device , said time period having a duration 
corresponding to that predefined for a typical speech utterance ; and 

a speech recognizer for deriving an output indicative of the speech utterances as 
detected only by the acoustic detector, the speech recognizer being connected to be 
responsive to the acoustic detector in response to the signal having the first value. 

2. (original) The speech recognition system of claim 1 wherein the processing 
arrangement causes the signal to have the second value in response to any of: 

(a) the acoustic detector not detecting a finite, nonzero acoustic response while 
the visual detector does not detect speech utterances of the speaker, 
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(b) the acoustic detector detecting a finite, nonzero acoustic response while the 
visual detector does not detect speech utterances of the speaker, and 

(c) the acoustic detector not detecting a finite, nonzero acoustic response while 
the visual detector detects speech utterances of the speaker. 

3.-5. (canceled) 

6. (previously presented) The speech recognition system of claim 1 wherein the 
circular buffer assures that the beginning of each speech utterance is coupled to the 
speech recognizer. 

7. (previously presented) The speech recognition system of claim 6 wherein the 
circular buffer is connected to be responsive to the acoustic detector, the circular buffer 
including a plurality of stages for storing sequential segments of the output of the 
acoustic detector, the delay arrangement being such that the contents of the memory 
element stage storing the beginning of a speech utterance are initially coupled to the 
speech recognizer. 

8. (canceled) 

9. (previously presented) The speech recognition system of claim 1 wherein the 
processing arrangement includes a delay arrangement for assuring that in response to 
completion of each speech utterance the acoustic detector is decoupled from the speech 
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recognizer. 

10. - 11. (canceled) 

12. (original) The speech recognition system of claim 1 wherein the processing 
arrangement includes a face recognizer connected to be responsive to the visual detector. 

13. (previously presented) The speech recognition system of claim 12 wherein the 
face recognizer is arranged for enabling the signal to have the first value in response to 
the face of the speaker being at a predetermined orientation relative to the visual detector. 

14. (previously presented) The speech recognition system of claim 13 wherein the 
face recognizer is arranged for: 

(1) detecting and distinguishing the faces of a plurality of speakers, and 

(2) enabling the signal to have the first value in response to the speaker having a 
recognized face. 

15. (previously presented) The speech recognition system of claim 14 wherein the 
processing arrangement includes a speaker identity recognizer connected to be responsive 
to the acoustic detector, the speaker identity recognizer being arranged for: 

(1) detecting and distinguishing speech patterns of a plurality of speakers, and 

(2) enabling the signal to have the first value in response to the speaker having a 
recognized speech pattern. 
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1 6. (previously presented) The speech recognition system of claim 1 5 wherein the 
processing arrangement is arranged for causing the signal to have the first value in 
response to the speaker having a recognized face matched with a recognized speech 
pattern of the same speaker. 

17. (previously presented) The speech recognition system of claim 1 wherein the 
processing arrangement includes a face recognizer connected to be responsive to the 

r 

visual detector and a speaker identity recognizer connected to be responsive to the 
acoustic detector, the face recognizer being arranged for detecting and distinguishing the 
faces of a plurality of speakers, the speaker identity recognizer being arranged for 
detecting and distinguishing speech patterns of a plurality of speakers, the processing 
arrangement being arranged for enabling the signal to have the first value only in 
response to the speaker having a recognized face matched with a recognized speech 
pattern of the same speaker. 

18. (currently amended) A method of recognizing speech utterances of a speaker 
with an automatic speech recognizer only responsive to acoustic speech utterances of the 
speaker comprisingi 

predefining a time duration corresponding to a typical speech utterance; 

detecting acoustic energy having a spectrum associated with speech utterances, 
continuously receiving and maintaining a most recent time period last few seconds 

of the acoustic energy , said period having said duration , 
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detecting at least one facial characteristic associated with speech utterances of the 
speaker, and 

activating the automatic speech recognizer in response to the detected acoustic 
energy having a spectrum associated with speech utterances while the at least one facial 
characteristic associated with the speech utterances of the speaker is occurring. 

19. (original) The method of claim 18 further comprising preventing activation of 
the automatic speech recognizer in response to any of: 

(a) no acoustic energy having a spectrum associated with speech utterances being 
detected while no facial characteristic associated with speech utterances of the speaker is 
detected, 

(b) acoustic energy having a spectrum associated with speech utterances being 
detected while no facial characteristic associated with speech utterances of the speaker is 
detected, and 

(c) no acoustic energy having a spectrum associated with speech utterances being 
detected while at least one facial characteristic associated with speech utterances of the 
speaker is detected. 

20. (original) The method of claim 18 further comprising assuring that the 
beginning of each speech utterance is coupled to the speech recognizer. 
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21. (original) The method of claim 20 wherein the beginning of each speech 
utterance is assuredly coupled to the speech recognizer by: 

(a) delaying the speech utterance, 

(b) recognizing the beginning of each speech utterance, and 

* 

(c) responding to the recognized beginning of each speech utterance to couple the 
delayed speech utterance associated with the beginning of each speech utterance to the 
speech recognizer and thereafter sequentially coupling the remaining delayed speech 
utterances to the speech recognizer. 

22. (original) The method of claim 18 further comprising assuring that no 
detected acoustic energy is coupled to the speech recognizer upon the completion of a 
speech utterance. 

23. (original) The method of claim 22 wherein assurance that no detected acoustic 
energy is coupled to the speech recognizer upon the completion of a speech utterance is 
provided by: (a) delaying the acoustic energy associated with the speech utterance, (b) 
recognizing the completion of each speech utterance, and (c) responding to the 
recognized completion of each speech utterance to decouple delayed acoustic energy 
occurring after the completion of each speech utterance from the speech recognizer. 

24. (original) The method of claim 18 wherein the at least one facial characteristic 
indicates the face of the speaker has a predetermined orientation relative to a detector 
involved in the step of detecting the at least one facial characteristic. 
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25. (previously presented) The method of claim 24 further comprising 
distinguishing the face of the speaker from a plurality of speakers, distinguishing the 
speech pattern of the speaker from a plurality of speakers, and activating the automatic 
speech recognizer in response to the speaker having a recognized face matched with a 
recognized speech pattern of the same speaker. 

26. (previously presented) The method of claim 18 further comprising 
distinguishing the face of the speaker from a plurality of speakers, distinguishing the 
speech pattern of the speaker from a plurality of speakers, and activating the automatic 
speech recognizer in response to the speaker having a recognized face matched with a 
recognized speech pattern of the same speaker. 

27. (previously presented) The method of claim 26 further including storing: 

(1) images of the faces of a plurality of speakers, and 

(2) the speech patterns of the same plurality of speakers during at least one 
training period; and performing the distinguishing steps by comparing the stored images 
and speech patterns with images of the face of the speaker and the speech pattern of the 
speaker. 

28. (new) The system of claim 1, further comprising a segmentor, and, for the 

■ 

continuous maintaining, a circular buffer having a number of stages sufficiently large to 
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store sequential speech segments said segmentor would derive from said typical speech 
utterance. 

29. (new) The method of claim 18, further comprising providing a segmentor, 
and, for the continuous maintaining, a circular buffer having a number of stages 
sufficiently large to store sequential speech segments said segmentor would derive from 
said typical speech utterance. 

30. (new) A speech recognition system comprising: 

an acoustic detector for detecting speech utterances of a speaker using an audio 
input device; 

a visual detector for detecting at least one facial characteristic associated with 
speech utterances of the speaker; 

a processing arrangement comprising a circular buffer and connected to be 
responsive to the acoustic and visual detectors for deriving a signal having first and 
second values respectively indicative of the speaker making and not making speech 
utterances such that the first value is derived in response to the acoustic detector 
detecting, while the visual detector detects at least one facial characteristic associated 
with speech utterance of the speaker, that an acoustic response supplied to said audio 
input device and currently stored in said circular buffer is finite and nonzero, said circular 
buffer being configured for continuously receiving and maintaining a most recent time 
period of said acoustic response to be subject to said detecting by the acoustic detector, 
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said time period having a duration corresponding to that predefined for a typical speech 
utterance; and 

a speech recognizer for deriving an output indicative of the speech utterances as 
detected only by the acoustic detector, the speech recognizer being connected to be 
responsive to the acoustic detector in response to the signal having the first value. 


31. (new) The method of claim 18, wherein said detecting of current occurrence 
of at least one facial characteristic associated with speech utterances pertains to speech 
utterances of the speaker, said activating occurs in response to said detecting that acoustic 
energy currently has a spectrum associated with speech utterances being concurrent with 
said detecting of current occurrence of said detecting at least one facial characteristic 
associated with speech utterances of the speaker. 


10 


